21 research outputs found
GPU Accelerated Color Correction and Frame Warping for Real-time Video Stitching
Traditional image stitching focuses on a single panorama frame without
considering the spatial-temporal consistency in videos. The straightforward
image stitching approach will cause temporal flicking and color inconstancy
when it is applied to the video stitching task. Besides, inaccurate camera
parameters will cause artifacts in the image warping. In this paper, we propose
a real-time system to stitch multiple video sequences into a panoramic video,
which is based on GPU accelerated color correction and frame warping without
accurate camera parameters. We extend the traditional 2D-Matrix (2D-M) color
correction approach and a present spatio-temporal 3D-Matrix (3D-M) color
correction method for the overlap local regions with online color balancing
using a piecewise function on global frames. Furthermore, we use pairwise
homography matrices given by coarse camera calibration for global warping
followed by accurate local warping based on the optical flow. Experimental
results show that our system can generate highquality panorama videos in real
time
Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training
Recently, sparse training has emerged as a promising paradigm for efficient
deep learning on edge devices. The current research mainly devotes efforts to
reducing training costs by further increasing model sparsity. However,
increasing sparsity is not always ideal since it will inevitably introduce
severe accuracy degradation at an extremely high sparsity level. This paper
intends to explore other possible directions to effectively and efficiently
reduce sparse training costs while preserving accuracy. To this end, we
investigate two techniques, namely, layer freezing and data sieving. First, the
layer freezing approach has shown its success in dense model training and
fine-tuning, yet it has never been adopted in the sparse training domain.
Nevertheless, the unique characteristics of sparse training may hinder the
incorporation of layer freezing techniques. Therefore, we analyze the
feasibility and potentiality of using the layer freezing technique in sparse
training and find it has the potential to save considerable training costs.
Second, we propose a data sieving method for dataset-efficient training, which
further reduces training costs by ensuring only a partial dataset is used
throughout the entire training process. We show that both techniques can be
well incorporated into the sparse training algorithm to form a generic
framework, which we dub SpFDE. Our extensive experiments demonstrate that SpFDE
can significantly reduce training costs while preserving accuracy from three
dimensions: weight sparsity, layer freezing, and dataset sieving.Comment: Published in 36th Conference on Neural Information Processing Systems
(NeurIPS 2022
You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model
Large-scale Transformer models bring significant improvements for various
downstream vision language tasks with a unified architecture. The performance
improvements come with increasing model size, resulting in slow inference speed
and increased cost for severing. While some certain predictions benefit from
the full complexity of the large-scale model, not all of inputs need the same
amount of computation to conduct, potentially leading to computation resource
waste. To handle this challenge, early exiting is proposed to adaptively
allocate computational power in term of input complexity to improve inference
efficiency. The existing early exiting strategies usually adopt output
confidence based on intermediate layers as a proxy of input complexity to incur
the decision of skipping following layers. However, such strategies cannot
apply to encoder in the widely-used unified architecture with both encoder and
decoder due to difficulty of output confidence estimation in the encoder. It is
suboptimal in term of saving computation power to ignore the early exiting in
encoder component. To handle this challenge, we propose a novel early exiting
strategy for unified visual language models, which allows dynamically skip the
layers in encoder and decoder simultaneously in term of input layer-wise
similarities with multiple times of early exiting, namely \textbf{MuE}. By
decomposing the image and text modalities in the encoder, MuE is flexible and
can skip different layers in term of modalities, advancing the inference
efficiency while minimizing performance drop. Experiments on the SNLI-VE and MS
COCO datasets show that the proposed approach MuE can reduce expected inference
time by up to 50\% and 40\% while maintaining 99\% and 96\% performance
respectively
The Lottery Ticket Hypothesis for Vision Transformers
The conventional lottery ticket hypothesis (LTH) claims that there exists a
sparse subnetwork within a dense neural network and a proper random
initialization method, called the winning ticket, such that it can be trained
from scratch to almost as good as the dense counterpart. Meanwhile, the
research of LTH in vision transformers (ViTs) is scarcely evaluated. In this
paper, we first show that the conventional winning ticket is hard to find at
weight level of ViTs by existing methods. Then, we generalize the LTH for ViTs
to input images consisting of image patches inspired by the input dependence of
ViTs. That is, there exists a subset of input image patches such that a ViT can
be trained from scratch by using only this subset of patches and achieve
similar accuracy to the ViTs trained by using all image patches. We call this
subset of input patches the winning tickets, which represent a significant
amount of information in the input. Furthermore, we present a simple yet
effective method to find the winning tickets in input patches for various types
of ViT, including DeiT, LV-ViT, and Swin Transformers. More specifically, we
use a ticket selector to generate the winning tickets based on the
informativeness of patches. Meanwhile, we build another randomly selected
subset of patches for comparison, and the experiments show that there is clear
difference between the performance of models trained with winning tickets and
randomly selected subsets
Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training
Vision transformers (ViTs) have recently obtained success in many
applications, but their intensive computation and heavy memory usage at both
training and inference time limit their generalization. Previous compression
algorithms usually start from the pre-trained dense models and only focus on
efficient inference, while time-consuming training is still unavoidable. In
contrast, this paper points out that the million-scale training data is
redundant, which is the fundamental reason for the tedious training. To address
the issue, this paper aims to introduce sparsity into data and proposes an
end-to-end efficient training framework from three sparse perspectives, dubbed
Tri-Level E-ViT. Specifically, we leverage a hierarchical data redundancy
reduction scheme, by exploring the sparsity under three levels: number of
training examples in the dataset, number of patches (tokens) in each example,
and number of connections between tokens that lie in attention weights. With
extensive experiments, we demonstrate that our proposed technique can
noticeably accelerate training for various ViT architectures while maintaining
accuracy. Remarkably, under certain ratios, we are able to improve the ViT
accuracy rather than compromising it. For example, we can achieve 15.2% speedup
with 72.6% (+0.4) Top-1 accuracy on Deit-T, and 15.7% speedup with 79.9% (+0.1)
Top-1 accuracy on Deit-S. This proves the existence of data redundancy in ViT.Comment: AAAI 202
Comparing the Primary and Recall Immune Response Induced by a New EV71 Vaccine Using Systems Biology Approaches
<div><p>Three inactivated EV71 whole-virus vaccines have completed Phase III clinical trials in mainland China, with high efficacy, satisfactory safety, and sustained immunogenicity. However, the molecular mechanisms how this new vaccine elicit potent immune response remain poorly understood. To characterize the primary and recall responses to EV71 vaccines, PBMC from 19 recipients before and after vaccination with EV71 vaccine are collected and their gene expression signatures after stimulation with EV71 antigen were compared. The results showed that primary and recall response to EV71 antigen have both activated an IRF7 regulating type I interferon and antiviral immune response network. However, up-regulated genes involved in T cell activation regulated by IRF1, inflammatory response, B-cell activation and humoral immune response were only observed in recall response. The specific secretion of IL-10 in primary response and IL-2,IP-10,CCL14a, CCL21 in recall response was consistent with the activation of immune response process found in genes. Furthermore, the expression of MX1 and secretion of IP-10 in recall response were strongly correlated with NTAb level at 180d after vaccination (r = 0.81 and 0.99). In summary, inflammatory response, adaptive immune response and a stronger antiviral response were indentified in recall response.</p></div
Heat map of DEGs in primary and recall response.
<p>Colors ranging from blue to red corresponded represent the DEGs’ average fold change among the subjects (n = 19). (a) Common genes identified in primary and recall response. However, the fold change of these genes in recall response was higher than that in primary response. (b) Pathways that were only observed in recall response,including inflammatory response, antigen processing and presentation, B cell activation, T cell activation and humoral immune response.</p